DMP linkable icons

http://www.geolabel.info/IMG/Eight_Facets/Producer_Profile.pnghttp://www.geolabel.info/IMG/Eight_Facets/Lineage.png

 

DMP-8: Data and metadata verification

 

 

The concept

 

Data and associated metadata held in data management systems will be periodically verified to ensure integrity, authenticity and readability.

 

Related terms: Authenticity, Integrity, Readability.

 

Category: Preservation

 

Explanation of the principle

 

Important among the actions performed by TDRs described above in DMP-7, is periodic checking and transformation (file migration) of data to ensure that they do not become obsolete. Constant and careful maintenance of the preserved data sets (data and associated knowledge) is necessary to ensure data integrity, authenticity, readability and thus usability over the long term. Archive and Data Management Systems’ curation and maintenance consist of all the activities aimed at guaranteeing the integrity, authenticity and readability of the archived and preserved data. This covers the storage of equipment, media and hard disk arrays in secured and environmentally controlled rooms, and a set of defined activities to be performed on routine basis, such as migration to new systems and media, in accordance with the technology and consumer market evolution, data compacting and data format/packaging conversion. Data holders and archive owners need to design a maintenance scheme for their Archives and Data Management System to guarantee the integrity of the archived and collected data.

 

Guidance on Implementation, with Examples

 

1. Archived data refreshment: Periodically perform a migration of the archived data (“media refreshment”) to the most adequate proven technology for data storage, to ensure data access preservation. Technology selection should not only be based on technical and cost aspects, but should also aim at the minimization of environmental impact (e.g. in terms of power consumption, thermal dissipation, etc.);

 

2. Archived data formats description: Provide formal description of old archiving formats to allow the conversion to new standard formats, which will increase technical compatibility and reduce diversity of formats and interfaces between archives;

 

3. Archived data duplication: Maintain identical copies of all archived data applying one of the security levels defined below:

  • a. Dual copy in the same geographical location (but different buildings) to avoid data loss due to media degradation or obsolescence, or

  • b. Dual copy in the same geographical location (but different buildings) based on different technology to avoid technology based principle failures, or

  • c. Dual copy in two different geographical locations to safeguard the archive from external hazards (e.g. floods, other natural and technological hazards, etc.), or

  • d. Dual copy in two different geographical locations, based on different technologies to avoid technology based principle failures.

 

4. Archive system components migration (hardware): Perform periodical migration of archive system components to new hardware platforms.

 

5. Media readability and accessibility tests: Perform periodical test for media readability and accessibility on a representative set of the archived data.

 

6. Archive content integrity: Periodically verify the integrity of the archive collection/content through integrity check on a representative set of the archived data.

 

7. Data content integrity: Ensure that archived content and associated information remains unchanged and, if changes are made, that these are documented, and that this documentation is preserved and made available as well (provenance information).

 

Metrics to measure level of adherence to the principle

 

Measures for the level of adherence include the Data Preservation Guidelines in point C above or to ISO 16363:2012 - Space data and information transfer systems - Audit and certification of trustworthy digital repositories (CCSDS 652.0-M-1), the standard used to assess the trustworthiness of a generic digital repository.

 

Resource Implications of Implementation

 

Estimating the cost in terms of resources for long-term digital preservation has received much attention from many organisations (e.g. companies, digital libraries, research data centres) interested in preserving their data and depends on the organization and on the data to be preserved (e.g. volume, format, etc.) and can therefore only be modelled here. Cost modelling techniques are used to estimate the costs involved in digital asset preservation and their economic impact on the organisation. Generic Cost models follow two main steps:

 

1. Identifying resource costs and activities

Activities identified for the Archiving process include managing storage, refreshment, migration, reporting, back-up, reformatting/repackaging, test and integrity verification, and reporting on archived data formats. Resources needed to complete the cost analysis include human resources and equipment, office/work space, IT services and technology, and other utilities. Usability and integrity are core parameters for quantifying impact.

 

Activities

Parameters

Impact

Manage Storage

* Usability (Readability, Authenticity)

 

* Integrity

This activity is very important in order to ensure the physical preservation of digital data and consequently the physical access to it, that is to maintain data and technologies (HW, SW) used for accessing the data. If this activity is incorrectly performed, the risk of losing the data, as well as the ability to access the data, is very high.

Manage Refreshment

 

Manage Migration

 

Manage Reporting

 

Manage Backup

 

Manage Reformatting/ Repackaging

 

Manage Test and Integrity Verification

* Usability (Readability; Authenticity)

 

* Integrity

It is very important in order to ensure the physical preservation of digital data and consequently the physical access to it, and its availability over time. Without such activities, the data can be lost in the long term, without the possibility to recover it or, if not correctly managed, the access to data could be lost.

Report on archived data format

* Integrity

These activities are relevant in order to ensure the traceability of each action on the data. This can support the integrity and completeness of data and information provided to the data users

 

2. Assigning resource costs to activities and Assigning activity costs to cost objects

The aforesaid step should be done with simulation and estimation value.

 

 

Text extracted from the Data Management Principles Implementation Guidelines